视觉域的适应性(DA)试图将经过训练的模型转移到分发转移的未看到的,未标记的域,但是方法通常着重于适应卷积神经网络体系结构,并使用有监督的成像网表示。在这项工作中,我们将重点转移到将现代体系结构改编成对象识别的重点 - 越来越流行的视觉变压器(VIT)以及基于自我监督的学习(SSL)的现代预测。受到最新SSL方法的启发,该方法是基于通过掩盖或裁剪生成的部分图像输入的学习的 - 要么通过学习预测缺失的像素或学习代表性的不断增强来进行这种增强 - 我们提出了简单的两阶段适应性PACMAC自我监督VIT的算法。 PACMAC首先在汇总源和目标数据上执行内域SSL,以学习任务歧视性特征,然后探究该模型的预测一致性,这些歧视性的一致性是通过新的注意力条件掩盖策略生成的一组部分目标输入,以识别自我候选者的可靠候选者-训练。我们的简单方法导致对使用VIT和对标准对象识别基准的自我监督初始化的竞争方法的性能一致。可在https://github.com/virajprabhu/pacmac上找到代码
translated by 谷歌翻译
域自适应语义分割的大多数现代方法依赖于适应期间继续访问源数据,这可能是由于计算或隐私约束而不可行的。我们专注于对语义分割的无源域适应,其中源模型必须仅为仅给出未标记的目标数据给出的新目标域。我们提出了增强一致性引导的自我培训(ATHCO),一种无源适应算法,它使用模型的像素级预测一致性,各种目标图像的自动生成的视图以及模型置信度来识别可靠的像素预测,并选择性地那些人的自动训练。ATHCO在三个标准基准测试中实现最先进的结果,以便在语义分割中的3个标准基准,所有内部都在实现和快速收敛方法中。
translated by 谷歌翻译
Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required.
translated by 谷歌翻译
In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented racks, the front and the top view layout of each shelf within a rack. With minimal effort, such an output is transformed into a 3D rendering of all racks, shelves and objects on the shelves, giving an accurate 3D depiction of the entire warehouse scene in terms of racks, shelves and the number of objects on each shelf. MVRackLay generalizes to a diverse set of warehouse scenes with varying number of objects on each shelf, number of shelves and in the presence of other such racks in the background. Further, MVRackLay shows superior performance vis-a-vis its single view counterpart, RackLay, in layout accuracy, quantized in terms of the mean IoU and mAP metrics. We also showcase a multi-view stitching of the 3D layouts resulting in a representation of the warehouse scene with respect to a global reference frame akin to a rendering of the scene from a SLAM pipeline. To the best of our knowledge, this is the first such work to portray a 3D rendering of a warehouse scene in terms of its semantic components - Racks, Shelves and Objects - all from a single monocular camera.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
We present the design, development, and evaluation of HREyes: biomimetic communication devices which use light to communicate information and, for the first time, gaze direction from AUVs to humans. First, we introduce two types of information displays using the HREye devices: active lucemes and ocular lucemes. Active lucemes communicate information explicitly through animations, while ocular lucemes communicate gaze direction implicitly by mimicking human eyes. We present a human study in which our system is compared to the use of an embedded digital display that explicitly communicates information to a diver by displaying text. Our results demonstrate accurate recognition of active lucemes for trained interactants, limited intuitive understanding of these lucemes for untrained interactants, and relatively accurate perception of gaze direction for all interactants. The results on active luceme recognition demonstrate more accurate recognition than previous light-based communication systems for AUVs (albeit with different phrase sets). Additionally, the ocular lucemes we introduce in this work represent the first method for communicating gaze direction from an AUV, a critical aspect of nonverbal communication used in collaborative work. With readily available hardware as well as open-source and easily re-configurable programming, HREyes can be easily integrated into any AUV with the physical space for the devices and used to communicate effectively with divers in any underwater environment with appropriate visibility.
translated by 谷歌翻译
主动映射的传统方法专注于构建几何图。但是,对于大多数真实世界应用程序,可行的信息与环境中的语义有意义的对象有关。我们提出了一种用于主动度量语义映射问题的方法,该方法使多个异质机器人能够协作构建环境地图。这些机器人积极探索以最大程度地减少语义(对象分类)和几何(对象建模)信息中的不确定性。我们使用信息丰富但稀疏的对象模型表示环境,每个模型由基本形状和语义类标签组成,并使用大量现实世界数据在经验上表征不确定性。鉴于先前的地图,我们使用此模型为每个机器人选择动作以最大程度地减少不确定性。通过多种现实世界环境中的多机器人实验证明了我们的算法的性能。所提出的框架适用于广泛的现实问题,例如精确农业,基础设施检查和工厂中的资产映射。
translated by 谷歌翻译
在此评论中,我们为模糊C均值问题的“迭代重新加权算法”中提出了一个简单的替代推导。我们表明,对于IRW-FCM算法而得出的迭代步骤不过是流行的多数化最小化(MM)算法的步骤。本说明中提出的推导更简单明了,与IRW-FCM的推导不同,此处的推导不涉及引入任何辅助变量。此外,通过将IRW-FCM的步骤显示为MM算法,可以消除IRW-FCM算法的内环,并且可以有效地作为“单个环”算法运行算法。更确切地说,新的基于MM的推导推论IRW-FCM的单个内部环足够降低模糊C均值的目标函数,从而加快了IRW-FCM算法的速度。
translated by 谷歌翻译
傅立叶Ptychographic显微镜(FPM)是一种成像过程,它通过计算平均值克服了传统的传统显微镜空间带宽产品(SBP)的限制。它利用使用低数值孔径(NA)物镜捕获的多个图像,并通过频域缝线实现高分辨率相成像。现有的FPM重建方法可以广泛地分为两种方法:基于迭代优化的方法,这些方法基于正向成像模型的物理学以及通常采用馈送深度学习框架的数据驱动方法。我们提出了一个混合模型驱动的残留网络,该网络将远期成像系统的知识与深度数据驱动的网络相结合。我们提出的架构LWGNET将传统的电线流优化算法展开为一种新型的神经网络设计,该设计通过复杂的卷积块增强了梯度图像。与其他传统的展开技术不同,LWGNET在PAR上执行时使用的阶段较少,甚至比现有的传统和深度学习技术更好,尤其是对于低成本和低动态范围CMOS传感器。低位深度和低成本传感器的性能提高有可能显着降低FPM成像设置的成本。最后,我们在收集到的实际数据上显示出始终提高的性能。
translated by 谷歌翻译
由于不同的人对他人的情感表达方式有所不同,因此他们在唤醒和价值方面的注释本身是主观的。为了解决这个问题,这些情绪注释通常由多个注释者收集,并在注释者之间平均,以获取唤醒和价值的标签。但是,除了平均水平外,标签的不确定性也令人感兴趣,还应对自动情绪识别进行建模和预测。在文献中,为简单起见,标签不确定性建模通常以高斯对收集的注释的假设进行处理。但是,由于注释者的数量通常由于资源限制而相当小,因此我们认为高斯方法是一个相当粗略的假设。相比之下,在这项工作中,我们建议使用学生的T分布来对标签分布进行建模,这使我们可以考虑可用的注释数量。使用此模型,我们将基于相应的Kullback-Leibler差异函数得出相应的损失函数,并使用它来训练估计器以分布情绪标签,从中可以推断出平均值和不确定性。通过定性和定量分析,我们显示了T分布比高斯分布的好处。我们在AVEC'16数据集上验证了我们提出的方法。结果表明,我们基于T分布的方法对高斯方法进行了改进,而最新的不确定性建模会导致基于语音的情绪识别以及最佳甚至更快的收敛性。
translated by 谷歌翻译